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t~ 5 Abstract 

RANSAC is a popular technique for estimating model parameters in 
the presence of outliers. The best speed is achieved when the minimum 
possible number of points is used to estimate hypotheses for the model. 
Many useful problems can be represented using polyomial constraints (for 
instance, the determinant of a fundamental matrix must be zero) and so 
ryj have a number of soultions which are consistent with a minimal set. A 

O considerable amount of effort has been expended on finding the constraints 

of such problems, and these often require the solution of systems of poly- 

I nomial equations. We show that better performance can be achieved by 

^. using a simple optimization based approach on minimal sets. For a given 

£SJ minimal set, the optimization approach is not guaranteed to converge to 

C^) the correct solution. However, when used within RANSAC the greater 

speed and numerical stability results in better performance overall, and 

much simpler algorithms. We also show that by selecting more than the 

r — minimal number of points and using robust optimization can yield better 

results for very noisy by reducing the number of trials required. The in- 
f^**) creased speed of our method demonstrated with experiments on essential 

matrix estimation. 
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1 Introduction 

Many computer vision systems operating on video require frame-rate opera- 
tion in order to be useful. This paper is concerned with estimating parameters 
(in particular, the essential matrix) with the greatest possible efficiency. For 
RANSAC [3] schemes to be efficient, it is important to be able to estimate 
model hypotheses using the smallest possible amount of data, since the prob- 
ability of selecting a set of datapoints without outliers decreases exponentially 
with the amount of data required. A collection of datapoints of the minimum 
required size is known as a minimal set. In some computer vision problems, 
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(such as essential matrix estimation [TT] and image stitching with radial distor- 
tion [2]), the data describe a system which is subject to a number of polynomial 
constrains. Therefore, direct minimal set algorithms involve finding the solution 
of polynomial sets of equations. 

This paper is about using iterative solvers instead of direct polynomial 
solvers and is motivated by the following observations: 



• 



Minimal set algorithms are useful when one is performing robust estima- 
tion using RANSAC or a related scheme. 



• In RANSAC, speed matters even at the expense of quality. If one 

can conceive of an optimization which quadruples the number of hypothe- 
ses which can be generated and tested within a given time budget, even 
if three fifths of the hypotheses are bad, there is still a net increase in 
performance. 

• Finding the roots of high-degree polynomials is notoriously hard [TS] as 
the numerical stability of roots is very poor. Therefore, even direct solvers 
will not necessarily converge to correct solutions. 



• 



• 



There is no escape from iterative algorithms, as there are no general closed- 
form solutions for polynmials of degree five and higher. 

If one picks a super-minimal set of points, the probability of having at 
least a minimal number of inliers is much higher. 



• There are many problems for which no known direct minimal algorithms 
exist. 

Therefore, we propose two approaches: 

1. Pick a minimal set and the model using a simple, unconstrained nonlinear 
optimizer. See Algorithm [T] and [2] in Section [2] 

There are a number of theoretical trade-offs beteween optimization and poly- 
nomial based approaches. Both methods may not yield the correct answer even 
with a minimal set of inliers. 

Optimization is numerically stable, but gives at most one answer, whereas 
polynomial methods will not converge successfully if the correct root is poorly 
conditioned. In some important cases (e.g. essential matrix estimation) opti- 
mization has three advantages: the algorithm is simpler, faster and more nu- 
merically stable. 

2. Pick a super-minimal set and estimate the model using a robust algorithm 
such as iterative reweighted least squares. See Algorithm [3] in Section [2J 

In the presence of high outlier levels, the probability of having at least enough 
good points within the super-minimal set is much higher than the probability 
of picking a minimal set of only good points. This difference becomes high 



enough to outweigh the slow speed and poor convergence of robust optimization. 
An analogy can be drawn to forward error correction: by using a redundant 
representation (the super- minimal set), errors can be tolerated and it does not 
matter where the errors occur. 

In this paper, we apply these methods to the estimation of essential matrices. 
Essential matrices are often found from as set of correspondences between points 
in two images of the same scene. They are to estimate efficiently because the 
data from point correspondences contains outliers, and the minimal set is quite 
large (5 points). 

Robust estimation methods such as M-estimation [HI 0(22 ? case deletion and 
explicitly fitting and removing outliers [TU], can be used but these often only 
work if there are relatively few outliers. So the essential matrix is often found 
using some variant of RANSAC 3, 22 (RANdom S Ample Consensus) followed 
by an iterative procedure such as M-estimation in order to robustly minimize 
the reprojection error using all the data. 

The essential matrix has five degrees of freedom and the minimal set is five 
point matches. The five matches yield up to 10 solutions (see e.g. [J] for a recent 
proof) . A number of practical algorithms have been proposed [I3J [23] , the most 
prominent of which (due to its efficiency) is the '5-point algorithm', proposed 
by Nister [TT] , 

The 5-point algorithm involves setting up and solving a system of polyno- 
mial equations, so a number of related alternatives have been proposed which 
generally attempt to simplify or sidestep that process. 

A number of related alternatives have been proposed, which trade speed 
for simplicity. For instance, Grobner bases can be used to solve the polynomial 
equation system [2T] (requiring a 10 x 10 eigen decomposition), as can the hidden 
variable resultant method [9] or a nonlinear eigenvalue solver [5]. The problem 
of solving sets of polynomial equations can be sidestepped by reformulating the 
problem as a constrained function optimization pQ. 

Some approaches to getting faster performance make use of constrained mo- 
tion [T2] HE] in order to reduce the size of the minimal set required. These are 
therefore not applicable to general use. 

We compare our algorithms to the 5-point algorithm (Algorithm El). Since 
speed is critical in determining performance, we describe our implementations 
of the nonlinear optimization and 5-point algorithms in Sections [2] and [3j re- 
spectively, in addition to providing the complete source code as supplemental 
material. Results are given in Section [4j 

2 Optimization based solvers 

To optimize an essential matrix, we use a minimal (i.e. 5 degree of freedom), 
unconstrained parameterization related to the one presented in |20j . An essential 
matrix can be constructed of a translation and a rotation: 

E=[t\ x R, (1) 



where t is a unit vector, R is a rotation matrix and [t] x is a matrix such that for 
any vector v € K 3 , [t] x v = t x v. Given two 2D views of a point in 3D, as the 
homogeneous vectors p and p', the residual error with respect to an estimated 
essential matrix E is given by: 



l' T Eq. 



(2) 



We represent R with the 3-dimensional Lie group, SO(3) (see e.g. [171 1^1)- 
With the exponential map parameterization, wc choose the three generators to 
be: 

Gi = 

By taking infinitesimal motions to be left multiplied into R, the three derivatives 
of r with respect to R are: 
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We parameterize t using a rotation so that t = R t [l 0] T , with infinitesimal 
motions right multiplied into Rt- The remaining two derivatives are therefore 
given by: 

RtG t o Rq, *G{1,2}, (4) 

. J J x 

since [1 0] T is in the right null space of G3. Note that the resulting optimization 
does not need to be constrained. The epipolar reprojection errors (the distance 
between a point and the corresponding epipolar line, not the 'gold-standard' 
reprojection error), g are given by: 

(5) 
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Algorithm 1 Pick a minimal set of points, a random translation direction, a 
random rotation and minimize the sum of squared residual errors (Equation Q) 
using the LM (Levenberg-Marquadt) algorithm. Hypotheses that fail to converge 
quickly or converge with a large residual error are rejected. 
Comment: This is the standard algorithm for solving least-squares problems. 
In practise, the method is very insensitive to the choice of inital rotation. This 
technique yields zero or one solutions. 

Algorithm 2 Pick a minimal set of points and minimize the sum of squared 
residual errors using Gauss-Newton, abandoning hypotheses which do not con- 
verge sufficiently quickly. 

Comment: Although Gauss-Newton does not converge as effectively and 
reliably as LM, the low overhead means that the algorithm can converge in 
much less time. Additionally failure can be very fast, so little time is wasted on 
cases where the optimization may be very slow. In our tests with real data, 
this is the best performing algorithm. 



Algorithm 3 Pick a non minimal set and minimize the reweighted sum-square 
epipolar reprojection error (Equation^ using LM. Hypotheses with a large resid- 
ual error are rejected. 

Comment: A non-minimal set has a higher probability of containing at least 
five good matches compared to a minimal set. Since reweighted least-squares is 
robust to outliers [14] , the algorithm can converge to the correct answer even in 
the presence of errors. 

3 Our implementation of the 5-point algorithm 

Our implementation follows the implementation in |llj . with some differences 
which improve the speed/accuracy trade-off. Below is an outline of the algo- 
rithm, with our modifications highlighted in oblique text. Given two views of a 
point in 3D, q and q', the points are related with the essential matrix E: 

q'£q = (6) 

By defining E = [E n , E 12 ,E 13 ,E 2 i, •••] T and q= [q[qi, q[q 2 , q[q 3 , q' 2 qi,---}, 
Equation [6J can be rewritten as the vector equation qE = 0. Stacking five sets 
of equations from five different points gives the homogeneous set of equations 
QE = 0, where Q is a 5 x 9 matrix. The elements of E, E lie in the four 
dimensional null space of Q. If we were to extract the null space using singular 
value decomposition, it would be the single most expensive part of the algorithm. 
Since later stages of the algorithm do not require an orthonormal basis for the 
null space, we have found that best performance is achieved by using Gauss- 
Jordan reduction. Using elementary row operations, Q is reduced to [/|A]. 

Since: 

~ A' 



[I\A] 



0, (7) 



the matrix [A T \ — I] spans the null space of Q. The computational cost is that of 
computing a 5 x 5 matrix inverse. Since E can be written as a linear combination 
of the four vectors spanning the null space: 

E r = [x y z 1] [A r \ - I] , (8) 

what remains is to find x, y and z. 

There are 10 cubic constraints on an essential matrix given by \E\ = and 
and 2EE T E — trace(EE T )E = 0. Substituting in Equation py gives a system of 
homogeneous polynomial equations which can be written as a 10 x 20 matrix (M) 
multiplied by the monomial vector, [x 3 , y 3 , x 2 y, xy 2 , x 2 z, x 2 ,y 2 z, y 2 , xyz, xy, xz 2 , 
xz, x 7 yz 2 7 yz, y, z 3 7 z 2 ,z, 1]. We have found that the most efficient way of com- 
puting the entries of M is to use a computer algebra system to emit C code to 
build M directly (see the supplemental material). 

Gauss- Jordan reduction is applied to M , and a smaller matrix C(z) can be 
extracted which satisfies the homogeneous equations C(z)[x y 1] T = 0, where 
the elements of C{z) are degree 3 and 4 polynomials in z. Since C has a null 



space, its determinant must be zero. Valid essential matrices for the five matches 
are found by finding the roots of the degree 10 polynomial d corresponding to 
d(z) = \C(z)\. 

Root finding is the single most expensive part of the algorithm. As in [TTj . 
we use Sturm sequences to bracket the roots. Following the general philosophy 
of this paper, we tune our system to compute answers as rapidly as possible 
even if it incurs the penalty of missing some valid solutions. 

During bracketing, if a root is found to be at \z\ > 100, we abandon any 
further attempts to find the root, since even if z is found to machine precision, 
d will be far from zero. Additionally, we quickly abandon roots which are quite 
close to being repeated, since such roots take a long time to bracket and are 
numerically unstable. 

For root polishing, we use the hybrid Newton-Raphson/Bisection algorithm 
given in \15\ for a maximum of 10 iterations, though it usually converges in 
fewer than 6 iterations. If after 10 iterations or convergence, the value of d is 
too far from zero, then the root is simply discarded. 

Algorithm 4 Pick a minimal set and find all valid essential matrices using the 
algorithm described above. 

Comment: This is the five point algorithm as described in [11 with some 
further speed optimizations. This technique yields zero to ten solutions. 

4 Experiments and results 

In this section results are given for all algorithms on a variety of synthetic and 
real data. The total computation required for the experiments was approxi- 
mately 500 CPU hours. For the robust algorithm (Algorithm [3]) we found that 
10 point matches gave the best results. The results are computed in terms of 
reliability, which is defined as the proportion of essential matrices found cor- 
rectly. 

4.1 Synthetic data 

Synthetic frame pairs are generated for a camera with a 90° field of view, with 
translations up to 1 unit and rotations up to 35° with the following method: 

First, generate a point cloud so that points are distributed uniformly in the 
first camera in position and inverse depth, starting depth of 1 unit. Second, 
generate a random transform matrix and transform points to the second cam- 
era. Then add Gaussian measurement noise (cr = 0.001 units) to the projected 
position of the points in both cameras and remove any points no longer visible. 
Finally, generate a set of point matches from the points, create some mismatches 
(i.e. outliers) and randomize the order of the points. Regardless of the camera 
transformation, a set number of good and bad matches are created. 

From the data, we generate a fixed number of hypotheses and find the 
best one using preemptive, breadth first RANSAC [10]. The best hypothesis 



is then optimized on all the data using iterative reweighted least squares with 
the Levenberg-Marquadt algorithm. Unless specified, the results are shown with 
the best preemptive RANSAC block size [TO] . 

The total time is measured and averaged over 10,000 transformation matri- 
ces. The final essential matrix is classified as correct or incorrect based on the 
RMS (root mean square) reprojection error on the known inkers. The reliability 
is then computed as the number of hypotheses is increased. 

The results are shown in Figure 111 with the time required for a given relia- 
bility plotted against the inlier fraction. As can be seen Algorithm [l] is the best 
performing with moderate proportion (up to 80%) of outliers, outperforming Al- 
gorithm|4]by about a factor of 1.5. In very low outlier situations, all algorithms 
behave similarly, because other considerations (such as the final optimization) 
start to dominate, though Algorithm [4] has a slight edge of about 2% in some 
cases. However, even the very simple Algorithm [2] performs very nearly as well 
in these circumstances. 

It is interesting to note that with low outlier densities it is better to have 
few point matches but at high outlier densities, it is better to have more. As 
one might expect, the optimal number of points decreases as the reliability 
requirement is relaxed. 

Algorithm [3] is not shown in Figure [T] since it significantly undcrperforms 
the other algorithms in this regime. However, with a very high proportion of 
outliers, the improved probability of picking a set of matches with at least 5 
inliers exceeds the relative slowness and low reliability of the algorithm, causing 
it to dominate. This in shown in Figure [2j 

Another interesting point to note is that a reduction in the number of hy- 
potheses generated by RANSAC does not always reduce the processing time! 
A striking example of this is shown in Figure [2} If a good starting point is 
found, then the final robust optimization converges very quickly. However, if 
a good starting point is not found, then the optimization can take a long time 
to converge, and this computation dominates. The effect is less pronounded in 
high noise situations, eventually disappearing completely. 

In Figure [2] A, all algorithms perform about equally well for 30% inliers. By 
comparison, Algorithm[T]evaluates about 800 hypotheses, Algorithm[2]evaluatcs 
about 2,800 and AlgorithmBlevaluates about 2,700 (and tests about 450 minimal 
sets). 

4.2 Real data 

We generate real data by running a camera along a rail so that the direction of 
motion is known. Reconstructed essential matrices must be classified as correct 
or incorrect, and this is done by thesholding on the angle between the known 
translation direction and the reconstructed translation direction. We choose 10° 
as the threshold for the results shown, though the results are similar for a range 
of thresholds. We do not threshold on rotation, since there is some wobble in 
the motion of the camera. The setup is arranged so that the camera either has 
horizontal translation or translation along the optic axis. Some sample images 
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Figure 1: Graphs showing the time required to estimate an essential matrix with 
a given reliability plotted for a given fraction of inliers. The plots are shown for 
Algorithm [l](LM), Algorithm |](GN) and Algorithm |4](5-point). The number 
in the legend denotes the total number of point matches. Note that for 250 
points per frame, all of the algorithms require greater than 10 seconds to find a 
correct essential matrix with 99% reliability. 
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Figure 2: Plots of reliability against time for 1000 datapoints. Left: 10% and 5% 
inliers with a block size of 100. In this regime, Algorithm[3]is the best performing 
algorithm if the computational budget is limited. Right: different block sizes, 
with 60% inliers. Curve is parameterized with the number of hypotheses. In 
this regime, the time spent in the nonlinear optimization at the end dominates. 
Using a moderate number of hypotheses is faster overall than using a small 
number of hypotheses since the time spent in the optimization is reduced. 
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Figure 3: A-D: dataset 1, dominant planar, textured structure. A, B optic axis 
motion. C, D horizontal motion. E-H: dataset 2: no dominant plane or object 
position. E, F optic axis motion. G, H horizontal motion. 



are shown in Figure [31 No a priori knowledge of this is used in any of the 
algorithms. 

The point correspondences are generated from a system which is designed 
representative of a typical frame-rate vision application: 

1. Generate an image pyramid with the scalings {l,|,|,|,j,...} since these 
ratios can be generated very efficiently. 

2. Perform FAST-9 [TB] on each layer of the pyramid and extract a feature 
for each corner. 

3. For each corner in the frame at time t, find the best match in the frame 
at time t — N , with no restriction on matching distance. 

4. The best 20% of matches are retained and their order is randomized. 

We take N £ {1..60} to increase the baseline, making the number of frame pairs 
tested about 15,000. Note that the value of N is not used in the creation of any 
of the results below. The essential matrix is then estimated using RANSAC 
followed by an M-estimation step. We extract the translation and rotation 
using Horn's method [5 j and triangulate points to determine which of the four 
combinations of rotation and translation to use. 

Results for dataset 1 are shown in Figure HI As can be seen, unlike in the 
synthetic data, the simplest algorithm employing a Gauss-Newton optimizer 
exhibits exceptional performance compared to the other algorithms. The per- 
formance increase relative to the LM optimizer is because the GN optimizer 
very quickly abandons sets of points which are hard to optimize. As a result, it 
tests many more minimal sets than the LM optimizer. 

With the optic axis motion, the five-point algorithm performs much better. 
This motion is problematic for optimization based techniques since they are 
prone to local minima because the movement of the epipoles causes dramatic 
motion of the epipolar lines. A direct method such as the five-point algorithm 
does not suffer from this effect. 

The results show that the estimation of the essential matrix is particularly 
difficult when the optic flow is small. As can be seen, the reliability decreases 
slowy (or even increases) with increasing optic flow, even though the inlier rate 
drops significantly. The drop in inlier rate would cause a very large drop in 
performance if the reliability of the algorithms did not increase dramatically 
with inlier rate. 

As the pixel motion gets large, the inlier rate drops since feature point match- 
ing becomes more difficult. In these regimes, Algorithm [3] (robust estimation) 
shows some significant improvements over the other algorithms. 

The experiments on dataset 1 were repeated with higher corner detection 
thresholds giving 300, 200 and 100 retained matches per frame. The per- 
formance generally decreased with increasing thresholds, but the trends were 
largely unaffeced. 



Results for dataset 2 are quite similar to dataset 1. The main difference is 
that Algorithm [4] performs somewhat better relative to dataset 1 and Algorithm 
[3] performs somewhat worse. 

Finally, we repeated the experiments with a less accurate camera calibration. 
The inaccuracy caused a slight performance decrease across all algorithms. No 
algorithm appeared to be significantly less stable than any other in the presence 
of small calibration errors. 

5 Conclusions 

In general, reliable estimation of essential matrices remains very difficult prob- 
lem. On real data, the simplest algorithm generating RANSAC hy- 
potheses by minimizing residuals of a minimal set with Gauss-Newton 
(Algorithm [2]) — outperforms the other algorithm by a wide margin. 

On the synthetic data, LM (Algorithm [If , GN (Algorithm [2} and the five 
point algorithm (Algorithm B| perform similarly, with Algorithm [I] winning by 
a relatively wide margin with high outlier densities and Algorithm [4] winning 
by a small margin at low outlier densities. We also note that as expected, 
the performance of the robust algorithm (Algorithm [3]) is best when the outlier 
density is very high, proving to be the most suitable algorithm in high noise-time 
constrained situations. 

The results on real data are somewhat different and serve as a good illustra- 
tion as to the pitfalls of relying too heavily on synthetic data. The main point 
is that Algorithm [5] is by far the best performer when it comes to reconstruct- 
ing left-right motions. This is particularly interesting given that is also by far 
the simplest algorithm to implement. The case is less clear cut for forward- 
backward motions, with Algorithm [4] winning by a considerable margin in some 
cases. Additionally, Algorithm [3] can perform better than all other algorithms 
in high noise situations. 

In conclusion, for essential matrix estimation, simple optimization with Ga- 
uss-Newton (Algorithm [2]) is the best performing algorithm, giving the most 
consistently reliable results, especially in time-constrained operation. If com- 
putation time is not at a premium, then the best results would probably be 
achieved by a system which draws hypotheses from Algorithm [5] and Algorithm 
[4j These results also have wider applicability: simple, fast and numerically sta- 
ble iterative algorithms can be used for generating hypotheses for RANSAC in 
many situations, including those where currently complex, direct solutions are 
used and those for which no direct solutions are known. 
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Figure 4: Top half: results from dataset 1, ~ 500 matches retained per frame. 
Bottom half: results from dataset 2 (~ 300 matches). Within each half: Top 
row: horizontal motion. Bottom row: optic axis motion. Left: reliability against 
time aggregated over all data with at least 10 pixels of motion (inlier rate of 
about 0.25). Centre and Right: reliability plotted against average pixel motion 
for 1 second per frame and 10ms per frame. Note that the inlier rate is not 
constant, so it has been shown. 
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